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The  Power  of  Slightly  More  than  One  Sample  in 
Randomized  Load  Balancing 

Lei  Ying,  R.  Srikant  and  Xiaohan  Kang 


Abstract — In  many  computing  and  networking  applications, 
arriving  tasks  have  to  be  routed  to  one  of  many  servers,  with 
the  goal  of  minimizing  queueing  delays.  When  the  number  of 
processors  is  very  large,  a  popular  routing  algorithm  works  as 
follows:  select  two  servers  at  random  and  route  an  arriving  task 
to  the  least  loaded  of  the  two.  It  is  well-known  that  this  algorithm 
dramatically  reduces  queueing  delays  compared  to  an  algorithm 
which  routes  to  a  single  randomly  selected  server.  In  recent  cloud 
computing  applications,  it  has  been  observed  that  even  sampling 
two  queues  per  arriving  task  can  be  expensive  and  can  even 
increase  delays  due  to  messaging  overhead.  So  there  is  an  interest 
in  reducing  the  number  of  sampled  queues  per  arriving  task.  In 
this  paper,  we  show  that  the  number  of  sampled  queues  can  be 
dramatically  reduced  by  using  the  fact  that  tasks  arrive  in  batches 
(called  jobs).  In  particular,  we  sample  a  subset  of  the  queues 
such  that  the  size  of  the  subset  is  slightly  larger  than  the  batch 
size  (thus,  on  average,  we  only  sample  slightly  more  than  one 
queue  per  task).  Once  a  random  subset  of  the  queues  is  sampled, 
we  propose  a  new  load  balancing  method  called  batch-filling  to 
attempt  to  equalize  the  load  among  the  sampled  servers.  We  show 
that  our  algorithm  maintains  the  same  asymptotic  performance 
as  the  so-called  power-of-two-choices  algorithm  while  using  only 
half  the  number  of  samples. 


I.  Introduction 

In  many  computing  and  networking  applications,  including 
routing,  hashing,  and  load  balancing  (see  [14]),  a  router  (also 
called  scheduler)  has  to  route  arriving  tasks  to  one  of  many 
servers  with  the  goal  of  minimizing  queueing  delays.  Such 
applications  have  been  increasingly  relevant  recently,  due  to 
the  explosive  growth  of  cloud  computing  where  a  large  number 
of  servers  in  a  data  center  are  used  to  process  a  large  volume  of 
tasks.  Ideally,  one  would  like  the  router  to  consider  the  queue 
lengths  at  all  the  servers  and  select  the  shortest  of  the  queues 
since  this  is  delay  optimal,  at  least  in  certain  traffic  regimes 
(see  [6]  and  references  cited  within).  However,  sampling  all  the 
queues  can  be  expensive  when  the  number  of  servers  is  very 
large.  Motivated  by  such  considerations,  load  balancing  in  the 
large-server  limit  was  studied  in  [9],  [11],  [19].  The  key  result 
in  those  papers  is  that  queueing  delays  can  be  dramatically 
reduced  by  sampling  two  servers  for  each  task,  instead  of  just 
one,  and  routing  the  task  to  the  shorter  of  the  two  queues. 
We  will  call  this  basic  algorithm  the  power-of-two-choices 
algorithm  as  in  prior  work.  These  results  have  been  extended 
in  various  directions.  In  [3],  [4],  the  results  have  been  extended 
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to  the  case  of  heavy-tailed  distributions,  in  [17],  [18],  the 
effect  of  resource  pooling  has  been  considered,  and  the  case 
of  heterogeneous  servers  operating  under  the  processor-sharing 
discipline  has  been  treated  in  [12], 

In  this  paper,  we  are  motivated  by  cloud  computing  ap¬ 
plications  in  which  each  arrival  is  a  job  consisting  of  many 
tasks,  each  of  which  can  be  executed  in  parallel  in  possibly 
different  servers.  In  queueing  theory  parlance,  this  model 
differs  from  the  models  mentioned  earlier  due  to  the  fact  that 
task  arrivals  occur  in  batches,  i.e.,  each  job  corresponding  to 
a  batch  arrival  of  tasks.  We  note  the  terminology  we  use 
here:  a  job  is  a  collection  of  tasks,  and  each  task  can  be 
routed  independently  of  each  other.  Such  a  model  arises  in 
the  well-known  Map/Reduce  framework,  for  example,  where 
each  Map  job  consists  of  many  Map  tasks  (here,  we  do  not 
consider  the  Reduce  phase  of  the  job).  More  generally,  any 
parallel  processing  computer  system  will  have  job  arrivals 
which  consist  of  many  tasks  which  can  be  executed  in  parallel. 
The  question  of  interest  is  whether  the  fact  that  there  are  batch 
arrivals  can  be  exploited  to  significantly  reduce  the  sample 
complexity.  Here,  by  sample  complexity,  we  mean  the  number 
of  queues  sampled  per  arriving  task  to  make  routing  decision. 
Our  motivation  for  this  problem  arises  from  a  study  of  batch 
arrivals  to  computing  clusters  presented  in  [13],  where  the 
authors  observe  a  phenomenon  called  messaging  overhead, 
i.e.,  the  overhead  of  providing  task  backlog  feedback  can 
slow  down  servers  and  increase  the  delays  experienced  by 
tasks/jobs.  Further,  [13]  proposes  an  algorithm  which  achieves 
better  performance  than  the  power-of-two-choices  algorithm 
when  both  of  them  use  the  same  number  of  samples  per 
arriving  task.  In  this  paper,  we  observe  that  this  basic  algorithm 
for  batch  arrivals  suggested  in  [13]  does  not  work  well  in 
all  traffic  conditions.  Moreover,  we  present  a  new  algorithm 
which  exploits  batch  arrivals  in  a  manner  in  which  it  provides 
much  better  sample  complexity  than  the  power-of-two-choices 
algorithm  for  the  same  delay  performance.  Further,  when 
both  algorithms  are  allowed  the  same  sample  complexity,  our 
algorithm  achieves  better  delay  performance. 

Our  main  contributions  are  as  follows: 

1)  We  present  an  algorithm  which  samples  md  queues  where 
to  is  the  batch  size  (i.e.,  number  of  tasks)  of  a  job.  Thus, 
d  is  the  number  of  sampled  queues  per  task.  The  tasks  are 
routed  to  the  queues  using  a  novel  algorithm  called  water 
filling. 

2)  We  first  study  our  algorithm  and  other  previously  proposed 
algorithms  using  a  mean-field  analysis.  We  show  that, 
for  any  d  >  1,  we  achieve  better  performance  than  the 
traditional  power-of-two-choices  algorithm  in  the  large- 
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systems  regime.  Thus,  the  mean-field  analysis  shows  that, 
in  the  large-systems  regime,  we  can  reduce  the  number  of 
samples  per  arriving  task  dramatically:  from  d  =  2  to  any 
d  >  1. 

3)  We  then  justify  the  mean-field  analysis.  In  particular,  we 
first  show  that  the  stochastic  system  dynamics  converge 
to  deterministic  differential  equations  in  the  large-systems 
limit  for  any  finite  t.  Our  proof  here  is  motivated  by  the 
proof  of  a  celebrated  result  on  density-dependent  Markov 
processes  called  Kurtz’s  theorem  [7],  but  our  model  is 
somewhat  nonstandard  and  requires  additional  steps  which 
are  not  needed  in  the  original  Kurtz’s  theorem.  Further, 
using  a  novel  Lyapunov  function,  we  show  that  the  sys¬ 
tem  of  differential  equations  converges  to  an  equilibrium 
described  by  the  mean-field  analysis.  Then  by  showing 
the  interchange  of  the  limits,  we  prove  the  stationary 
distribution  of  the  queue  size  distribution  converges  to  the 
solution  of  the  differential  equations. 

4)  Finally,  we  perform  extensive  simulations  to  justify  that  our 
analytical  conclusions  are  indeed  valid  in  large,  but  finite, 
systems.  In  particular,  simulations  show  that  our  algorithm 
with  just  one  sample  per  task  on  average,  achieves  the 
same  job  delay  performance  as  the  power-of-two-choices 
algorithm  and  dramatically  reduces  the  delay  compared  to 
the  algorithm  proposed  in  [13], 

II.  Problem  Statement  and  Main  Results 

We  consider  a  computing  cluster  with  n  identical  servers 
and  a  central  scheduler  as  shown  in  Figure  1.  Each  server  can 
process  one  task  at  a  time.  Tasks  arrive  at  the  scheduler  in 
batches  (also  called  jobs).  Each  batch  consists  of  to  tasks  and 
the  job  arrival  process  is  a  Poisson  process  with  rate  —  A.  We 
want  the  batch  size  to  be  not  too  small,  so  we  assume  that  to  = 
0(log?t)  and  to  is  increasing  function  of  n.  For  simplicity,  we 
consider  a  deterministic  batch  size  here,  but  the  results  in  the 
paper  can  be  extended  to  random  batch  sizes  as  well  in  a 
straightforward  manner,  as  will  be  discussed  in  the  extended 
version  of  the  paper.  Furthermore,  the  results  of  this  paper 
hold  when  the  system  has  multiple  distributed  schedulers  and 
the  job  arrivals  on  these  schedulers  are  independent  Poisson 
processes  with  aggregated  rate  —A.  This  is  because  the  sum 
of  independent  Poisson  processes  is  Poisson.  The  scheduler 
dispatches  the  tasks  to  the  servers  when  a  job  arrives.  The 
service  times  of  the  tasks  are  exponentially  distributed  with 
mean  1,  and  are  independent  across  tasks.  When  a  task  arrives 
at  a  server,  it  is  processed  immediately  if  the  server  is  idle  or 
waits  in  a  FIFO  (first-in,  first-out)  queue  if  the  server  is  busy. 

We  first  describe  the  traditional  power-of-d-choices  algo¬ 
rithm  (which  is  a  simple  generalization  of  the  power-of- 
two-choices  mentioned  in  the  previous  section)  and  another 
previously-proposed  idea  called  the  batch  sampling  algorithm. 
Then,  we  present  our  idea  which  we  call  batch-filling,  which 
combines  batch  sampling  with  our  new  load  balancing  tech¬ 
nique  called  water-filling. 

The-Power-of-d-Choices  [10],  [19]:  When  a  batch  of  m  tasks 
arrive,  the  scheduler  probes  d  servers  uniformly  at  random  for 
each  task.  The  task  is  routed  to  the  least  loaded  server.  o 


m  tasks  in  each  job 


o 

server  1 

Fig.  1:  A  computing 
scheduler 


(scheduler^ 


cluster  with  n  servers  and  a  central 


Batch-Sampling  [13]:  When  a  batch  of  m  tasks  arrive,  the 
scheduler  probes  dm  servers  uniformly  at  random  to  acquire 
their  queue  lengths.  The  to  tasks  are  added  to  the  the  least 
loaded  m  servers,  one  for  each  server.  o 

In  this  paper,  we  propose  a  new  load-balancing  algorithm, 
named  batch-filling :  we  sample  queues  as  in  the  batch  sam¬ 
pling  algorithm  but  the  way  that  tasks  are  routed  to  servers 
uses  a  different  procedure  which  we  call  water-filling. 
Batch-Filling:  When  a  batch  of  to  tasks  arrive,  the  scheduler 
probes  dm  servers  uniformly  at  random  to  acquire  their  queue 
lengths.  The  to  tasks  are  added  to  the  dm  servers  using  water 
filling,  specifically,  the  tasks  are  dispatched  one  by  one  to 
the  least  loaded  server,  where  the  queue  length  of  a  server  is 
updated  after  it  receives  a  task.  o 

Remark:  In  batch-filling,  the  first  task  in  a  batch  is  routed 
to  the  least  loaded  server  among  the  sampled  servers,  i.e., 
the  one  with  the  smallest  number  of  tasks  in  its  queue.  The 
key  difference  compared  to  batch-sampling  is  that  the  server’s 
queue  size  is  updated  after  this  (which  means  that  this  server 
may  no  longer  be  the  least-loaded  in  the  sampled  servers), 
and  then  the  next  task  in  the  batch  is  again  routed  to  the 
least  loaded  server,  and  so  on.  As  we  will  see  later,  this  small 
change  to  the  routing  algorithm  has  dramatic  consequences 
to  the  sample  complexity  of  the  algorithm.  In  all  algorithms, 
at  each  step,  ties  are  broken  at  random  if  there  is  more  than 
one  least-loaded  server. 

In  this  paper,  d  is  called  probe  ratio,  which  is  assumed 
to  be  a  constant  independent  of  n.  As  in  [10],  [19],  we  will 
study  the  different  algorithms  in  the  large-systems  limit,  i.e., 
as  n  — >  oo,  since  a  data  center  today  may  consist  of  tens  of 
thousands  of  servers.  The  main  theoretical  results  which  will 
be  established  in  the  paper  are  summarized  in  Table  I,  and  we 
discuss  them  below. 

•  The  expected  per-task  delay  of  batch-filling  with  any  d  >  1 
is  smaller  than  both  batch-sampling  with  d  =  2  and  the- 
power-of-two-choices  when  A  — >  1~ .  In  other  words,  batch¬ 
filling  outperforms  the  other  two  algorithms  by  sampling 
slightly  more  than  one  server  per  task,  hence  the  title  of  the 
paper. 

•  The  size  of  the  longest-queue  in  the  system  under  the- 
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Batch-Filling 

Batch-Sampling 

Pod 

Expected  per-task  delay 

1  log(i-A)  .rw-M 

A  log(l-t-Ad)  +  'AU-L 

1  log(l-A)  ,  q 

A  log(Ad)  +lyAlTJ 

i  iog(i-A)  ATT/TTn 

A  log(Ad)  +LyA(T) 

Maximum  queue  size  in  the  system 

log(l-A) 

l°g3TT 

-l 

if  Ad  ^  1 

OO 

log  (1  + Ad) 

log(Ad) 

1 

1  — A 

if  Ad  =  1 

TABLE  I:  This  table  summarizes  the  expected  per-task  delays  and  the  maximum  queue  sizes  of  the  three  scheduling  algorithms. 
The  order  notation  Oa(’)  is  defined  when  1/(1  — A)  — >  oo,  i.e.,  A  -A  1  .  Pod  stands  for  the-power-of-d-choices.  In  batch-filling 
and  batch-sampling,  d>  1;  and  in  the-power-of-d-choices,  d  is  an  integer  and  d  >  2. 


power-of-d-choices  is  unbounded  for  any  d  >  2  because  the 
stationary  queue  length  distribution  has  unbounded  support. 
The  sizes  of  the  longest-queue  under  both  batch-filling  and 
batch-sampling  are  finite  because  the  stationary  distributions 
have  bounded  support.  The  longest  queue  under  batch-filling 
with  d  >  1  is  smaller  than  that  of  batch-sampling  with  d  =  2 
when  A  — »  1”.  When  d  is  close  to  1,  the  size  of  longest 
queue  under  batch-filling  is  much  smaller  than  that  under 
batch-sampling  (7  versus  26  when  d  =  1.1  and  A  =  0.99). 

•  The  small  and  bounded  size  of  the  queues  under  batch  filling 
has  important  consequences.  A  job  is  said  to  be  completed 
when  all  the  tasks  in  the  job  are  completed.  Since  the  tail  of 
the  queue  size  is  cut  off,  this  has  the  effect  of  significantly 
reducing  job  completion  delays,  as  we  will  see  later  in  the 
simulations  section. 

•  The  above  theoretical  results  suggest  that  the  sample  com¬ 
plexity  (i.e.,  the  number  of  samples  per  arriving  task)  can  be 
significantly  reduced  under  batch-filling.  On  the  other  hand, 
the  computational  complexity  is  slightly  increased  compared 
to  batch-sampling  since  we  require  to  have  to  compare  the 
sizes  of  the  smallest  queues  and  the  next  smallest  queues 
each  time  a  task  is  routed.  However,  this  increase  in  compu¬ 
tational  complexity  is  a  cost  to  be  paid  at  the  router  whereas 
increased  sample  complexity  slows  down  the  servers  since 
they  have  to  send  queue  length  feedback  which  takes  time 
away  from  their  primary  role  of  processing  tasks.  This  is 
the  reason  why  sample  complexity  is  a  more  significant 
issue  than  the  computational  complexity  in  data  centers 
(although  we  do  not  want  the  computational  complexity  to 
be  very  high  either).  The  batch-sampling  algorithm  performs 
0(dm\ogm)  computations  per  batch  which  corresponding 
to  a  sorting  operation,  while  batch-filling  algorithm  performs 
an  additional  2 m  operations  since  it  has  to  keep  track  of  the 
queue  lengths  of  the  smallest  queues  and  the  next  smallest 
queue. 


Theorem  1.  The  Markov  chain  Q(n>(t)  is  positive  recurrent 
under  batch-filling.  Furthermore,  there  exists  a  constant  c  >  0, 
independent  of  n,  such  that 


E 


k= 1 


(n) 


<  C 


for  any  n,  where  Qk.  denotes  the  queue  length  of  server  k 
in  the  steady  state.  o 


The  proof  of  this  theorem  is  presented  in  the  appendix.  Let 
^ri'jn',  denote  the  stationary  distribution  of  queue  k,  i.e.,  the 
probability  that  the  queue  size  is  i  at  server  k.  Here,  the 
index  k  is  ignored  because  the  stationary  distributions  are 
identically  across  servers.  According  to  the  theorem  above, 
we  have  JL  h rj"'1  <  c,  which  further  implies  that  — A  0 

as  i  — >  oo  and  — >  0  as  i  — >  oo.  We  remark  that 

one  challenge  in  proving  that  the  stochastic  system  dynamics 
converge  to  deterministic  differential  equations  lies  in  that 
the  system  is  an  infinite-dimensional  system.  We  will  utilize 
the  facts  mentioned  above  to  overcome  this  challenge  in  the 
proofs. 

The  mean-field  analysis  proceeds  as  follows.  Assume  the  n 
queues  are  in  the  steady  state,  and  further  assume  that  the 
queue  lengths  are  identically  and  independently  distributed 
(i.i.d.)  with  distribution  n.  This  i.i.d.  assumption  in  the  mean- 
field  analysis  will  be  validated  later  in  Section  IV  in  the  large- 
systems  limit.  Now  consider  the  queue  evolution  of  one  server 
in  the  system.  Each  queue  forms  an  independent  Markov  chain 
as  shown  in  Figure  2,  denoted  by  and  the  transition 

rates  will  be  determined  by  the  particular  strategy  used  to 
route  tasks  to  servers.  We  will  derive  the  transition  rates  for 
each  of  the  strategies  described  earlier,  namely  batch  filling, 
batch  sampling,  and  the  power-of-d-choices,  in  the  rest  of  this 
section. 


III.  Mean-Field  Analysis 
In  this  section,  we  will  use  mean-field  analysis  to  study  the 
stationary  distributions  of  the  queue  lengths  under  batch-filling 
and  batch-sampling.  The  results  will  be  further  validated  using 
a  proof  inspired  by  the  proof  of  Kurtz’s  theorem  in  Section  IV. 
Let  Q^\t)  denote  the  queue  length  of  the  fcth  server  at  time  t 
in  a  system  with  n  queues.  It  can  be  easily  verified  that  (f) 
is  an  irreducible  and  nonexplosive  Markov  chain,  and  using 
the  standard  Foster-Lyapunov  theorem  (see,  for  example,  [15]) 
it  can  be  verified  that  the  Markov  chain  is  positive  recurrent 
and  hence,  has  a  unique  stationary  distribution. 


Fig.  2:  The  Markov  chain  representing  the  nth  system  in  the 
mean  field  analysis 
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A.  The  stationary  distribution  under  batch-filling 

We  first  consider  the  batch-filling  algorithm.  The  down¬ 
crossing  transition  rate  from  state  i  to  i  —  1  is  1  for  alii  >  1, 
i.e.. 


because  the  processing  time  of  a  task  is  exponentially  dis¬ 
tributed  with  mean  1.  The  up-crossing  transition  rate  from 
state  i  to  state  j  for  j  >  i  is 

Qif  =  —A  x  —  x  ^>(0)  x  P(j|0,i) 

m  n  z — ' 

<t> 

=  dA^P(0)P(j|0,t).  (1) 

0 

In  the  expression  above, 

•  —A  is  the  batch  arrival  rate; 

m 

•  dm/n  is  the  probability  a  server  is  probed  when  dm  servers 
are  sampled; 

•  cj)  is  a  (dm  —  l)-vector  that  denotes  the  queue  lengths  of 
the  other  dm  —  1  sampled  servers,  so 

dm—  1 

p(0)  =  n 

k= i 


and 

•  P(j\<fi,  i)  is  the  probability  that  a  server’s  queue  length 
becomes  j  when  the  server  is  sampled  and  is  in  state  i. 
and  the  the  states  of  the  other  dm  —  1  sampled  servers  are 

0- 

Without  loss  of  generality,  assume  fik  <  fii  if  k  <  l,  i.e., 
4>  is  ordered.  Recall  that  batch-filling  dispatches  tasks  using 
water  filling  among  the  sampled  dm  queues.  Therefore,  given 
i  and  <p,  either  j  =  i  if  no  task  is  assigned  to  the  server, 
or  j  takes  two  possible  values.  Consider  a  simple  example  in 
Figure  3  where  three  tasks  will  be  dispatched  to  four  servers 
with  queue  lengths  1,  1,  4,  and  4.  Then  the  servers  whose 
queue  size  is  4  will  not  receive  any  task,  and  the  servers  whose 
queue  size  is  1  will  receive  one  or  two  tasks. 


O 

o 

o 

o 

1 


2  3  4 


Fig.  3:  An  example  of  water  filling 


Assume  ties  are  broken  uniformly  at  random.  The  values  of 
P(j|0,i)  are  summarized  below. 

.  If 

dm—  1 

^2  (i  ~  M^cf>k<i-i  >  m,  (2) 

k=  1 


which  means  that  the  tasks  will  be  assigned  to  servers  whose 
original  queue  sizes  are  smaller  than  i,  then 


P(j\4>,i) 


1  if  j  =  i, 
0  if  j  f  i. 


•  If  condition  (2)  does  not  hold,  then  the  server  with  queue 
size  i  will  receive  some  tasks,  and 


P(j|0>*) 


where 


1  if  j  —  Qcf),i  1  ? 

if  j  —  Qcf),ii 


{dm—  1 

j  :  (j  -  i)  +  ^2  ( j  -  fik)^k<j-i  >  m 
k= 1 

which  is  the  maximum  size  a  queue  can  be  filled  up  to 
during  the  water  filling,  and  a^i  is  given  by 

m  -  (Q^j  -  1  -  i)  ~  Efc= ^(Q&i  - *  1  ~  0fc)y<Q0ii-i 

l  +  Etr%fc<Q„-i 

which  is  the  probability  that  a  server  receives  one  more  task 
after  its  queue  size  becomes  ,  —  1  during  water- filling. 

While  the  transition  rate  q)  in  (1)  is  a  complex  expression 

J  (n) 

for  finite  n.  the  following  lemma  shows  that  q)  ;  converges 
to  some  simple  q-ij  as  n  — >  oc.  The  proof  of  this  lemma  is 
presented  in  the  appendix. 


Lemma  2.  Under  batch-filling,  the  transition  rates  given  dis¬ 
tribution  it,  denoted  by  (it) ,  converges;  and  specifically. 


lim  tt)  =  qitj( tt), 


where  for  j  i, 


1  =  1) 

Ad(l-a7r)  ifj  =  Qir-l>i, 
A dan  if  j  =  Qn  >  i, 

0  otherwise, 


and 


Q-K  =  min  l  j :  -  l)tti  > 

l  ;= o 


(3) 


Qtt  — 


d  -  Ei=0  (cE  - 1  - 


EVtt 

7=1 


Q"  1  7T 
3=0  n3 


e  (0,1]. 


According  to  the  lemma  above,  the  queue  length  dynamics 
of  a  single  server,  in  the  limit  as  the  number  of  servers 
becomes  infinity,  can  be  represented  by  the  Markov  chain  in 
Figure  4,  where  the  up-crossing  transitions  are  into  only  two 
states  Q n  —  1  and  Qn  due  to  water  filling.  Based  on  Lemma 
2,  we  can  calculate  the  stationary  distribution  of  the  queue 
length  of  a  single  server  in  the  large-system  limit  by  finding 
7r  that  satisfies  the  global  balance  equation  [15]. 
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Xdot.Tr 


Fig.  4:  The  queue-length  Markov  chain  of  a  single-server,  in 
the  large-system  limit,  under  batch-filing 


We  next  check  the  global  balance  equations.  For  i  =  0, 

^o(Qo,Qbf  +  %,QBF-l)  ~  *191,0 

=(1-  A)Ad-  (l-A)Ad 

=0. 

For  1  <  i  <  Qbf  —  2, 


Theorem  3.  The  stationary  distribution  of  the  queue  length 
of  a  single  server  in  the  large-system  limit  under  batch-filling 
is 


Ki(Qi,i-l  +  Qi,QBF  +  Vi, Qbf- l)  ~  *i+l9i+M 
=  (1  -  A)Ad(l  +  Ad)<-1(l  +  Ad)  -  (1  -  A)Ad(l  +  A df 
=  0. 


1  —  A  i  =  0, 

(1  -  A)Ad(l  +  Ad)i_1  1  <  i  <  Qbf  —  1, 
1- (1- A)(l  +  Ad)c?^-1  i  =  Qbf, 

0  otherwise. 


(4) 


where  Qbf  = 


log(l-A) 

log(l+Ad) 


77;e  expected  queue  length  is 

log(l  -  A) 


log(l  +  Ad) 


■Oa(1). 


Proof  We  first  show  Qbf  —  Q-k,  where  Qn  is  defined  in  (3). 
Note  that  Qbf  >  1.  If  A  and  d  are  such  that  Qbf  =  1,  then 
equivalently 

log(l-A) 
log(l  +  Ad)  “  ’ 


For  i  =  Qbf  -  1, 


^"Qbf-1  (QQbf~  1,Qbf_ 2  4”  dQBir  — 1,Qsf) 
Qbf  —  2 

2=0 

=(1  -  A)Ad(l  +  Ad)^BF_2(l  +  Ada*) 

-  (1  -  A)(l  +  Ad)^BF_2Ad(l  -  a*) 

-  (1-  (1-  A)(l  + Ad)<5sF"1) 

=(1  -  A)(l  +  Ad)<3BF“1(Ada*  +  1)  -  1. 

From  the  definition  of  a*  we  can  verify  that 


which  implies  that  ^  fx  <  1  +  Ad,  or  A  <  1  —  A.  Then  7 f  = 
(1  —  A,  A,  0, ... )  and 

-  1  QBF- 

If  A  and  d  are  such  that  Qbf  >  1,  according  to  (3),  to  show 
Q-k  =  Qbf  we  only  need  to  show 

Qbf— 2  1  Qbf  — 1 

(Qbf  ~  1  —  0*1  <  ^  (Qbf -1)^1-  (5) 

i=o  1=0 

Let  LHS  and  RHS  denote  the  left-hand-side  and  the  right- 
hand-side  of  (5).  Then 


1  1 

“  Ad(l-A)(l  +  Ad)<5^-1  ~~  Ad' 

So  we  have 

^"Qbf-1^Qbf-1,Qbf-2  QQ BF  —  1,Q BF^ 

Qbf  —  2 

^  >  *i%QBF-l 
2=0 

^Qbf^Qbf,Qbf-1 

=0. 


2=0  j  —  0 


Ad 


and 


RHS  =  0E1t%  =  (l-A)£+A‘i)0"-1 

2=0  J=0 

Then  (5)  is  equivalent  to 


Ad 


Qbf  —  1  <  ~Yog(i-y  \d)  ~  Qbfi 

which  holds  according  to  the  definition  of  Qbf- 


For  i  =  Qbf, 


Qbf  —  1 

^QbfQQbf ,Qbf  —  1  y  .  ^iQi.QBF 

i= 0 

=(1  -  (1  -  A)(l  +  Ad)^BF_1)  -  (1  -  A)(l  +  Ad)<5BF_1Ada* 
=1  -  (1  -  A)(l  +  Ad)^BF_1(l  +  Ada*) 

=0. 

So  the  global  balance  equations  holds. 

Finally  the  expected  queue  length  in  stationary  distribution 
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IS 


7Ti  +  2tT2  +  •  ‘  *  +  QbfKqbf 
Qbf~  1 


=  E  h-E- 


i=o  y  j= o 

Qbf  —  1 

=  £  (1-  (1- A)(l  +  Ad)‘) 


2  =  0 


— Qbf  ~  (1  —  A) 

log(l  -  A) 
log(l  +  d) 


(1  +  A d)®BF  -  1 


A  d 


Oa(1)- 


□ 


B.  The  stationary  distribution  under  batch-sampling 

Recall  in  batch-sampling,  the  m  tasks  are  routed  to  the  least- 
loaded  to  queues  among  the  sampled  dm  queues.  Consider  a 
server  with  queue  size  i  and  assume  it  is  probed.  Then  the 
server  will  receive  a  task  with  probability 


E 


=  E 


-E 


min  <  1, 


min  <  1, 


min  <  1, 


'  V — '•.i — 1  V — >  dm — 1  -TT 

171  —  0  Z^/c=l  ^<Pk=j 

,  i  +  E 

rn_  _  y-i-1 
dm—  1  0  dm—  1 

Edm  —  1  -it 

_  k  =  l  ^<t>k=i 


dm—  1 


Ei —  1 

3=0  n3 


dm—  1 
+  ' 


Following  a  similar  analysis  as  batch-filling,  we  can  establish 
the  following  lemma.  The  details  are  omitted. 


Lemma  4.  Under  batch-sampling,  the  transition  rates  given 
distribution  7r,  denoted  by  (n)  converges;  and  specifically. 


lim  q\ 

n—>  oo 


(n) 


(7r)  =  Qi,  j(n) 


1  ifj  =  i~h 

A  d  if  i  +  1  =  j  <  Q  tt  —  1, 

A  CXty  ifi+l  —  j  Q-m 

0  otherwise , 


where 


Qn  =  min  <  i :  E  ^ 


;=o 


A  d  A  d  A  da„ 


Fig.  5:  The  Markov  chain  in  the  large-system  limit  under 
batch-sampling 


Theorem  5.  The  stationary  distribution  of  the  queue  length  of 
a  single  server  in  the  large-system  limit  under  batch-sampling 
is 


TT* 


1-  A 


(1  -  A)  A idi 
1  —  (1  —  A) 
0 


A;<f  —  1 
Ad~l 


i  =  0, 

1  <  *  <  Qbs  —  1, 
*  =  Qbs > 

otherwise. 


where 


Qbs 


log  (Ad) 


The  expected  queue  length  is 


log(l  ~  A) 
log(Ad) 


+  Oa(1). 


o 


C.  The  stationary  distribution  under  the-power-of-d-choices 

For  a  system  with  non-batch  (single)  arrivals,  the  stationary 
queue -length  distribution  of  a  single  server  in  the  large-system 
limit  under  the-power-of-d-choices  has  been  established  in 
[10],  [19],  The  power-of-d  choices  routing  under  our  batch- 
arrival  model  also  satisfies  the  same  limiting  queue-length 
distribution,  which  we  provide  below  for  comparison  purposes. 

Theorem  6.  The  stationary  distribution  of  the  queue  length 
of  a  server  in  the  infinite  system  under  the-power-of-d-choices 
is 


d'-l  di  +  1  -1 

iti  =  X  d~1  —  A  d-1 


The  expected  queue  length  is 


log(l  ~  A) 
log(Ad) 


+  Oa(1). 


o 


and 

l  _  yQ~- 2 
d  l^j= o  n3 

«7r  =  - - -  G  (0,  1J. 

nQ*~  1 

o 

The  Markov  chain  in  the  large-system  limit  is  shown  in 
Figure  5.  Given  tv,  the  Markov  chain  is  a  birth-death  process 
up  to  state  Qn.  The  stationary  distribution  can  again  be 
calculated  using  the  global  balance  equations.  The  results  are 
presented  in  Theorem  5,  and  the  details  are  omitted. 


IV.  Differential  Equations  and  Kurtz’s  Theorem 

The  results  in  the  previous  section  were  obtained  using  the 
mean-field  analysis  which  assumes  that  the  queues  are  i.i.d. 
across  servers.  We  will  justify  the  mean-field  analysis  in  this 
section. 

Again,  we  will  focus  on  batch-filling.  The  same  results  can 
be  established  for  batch-sampling  and  the-power-of-d-choices 
by  following  similar  steps.  We  first  consider  the  following 
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non-linear  system  described  by  differential  equations: 

dxj  _ 

d  t 


-(1  +  Ad)xi  +  xi+i 

i—1 

i  <  Xx  —  2 

Ad(l  —  ax)  Xj  —  (1  +  A dax)xi  +  Xi+ 1 , 

i  =  Xx-  1 

3= 0 

Adax  ^  ^  Xj  —  Xi  -\-  Xi- |_i, 

i  =  Xx 

3=0 

Xi  +  Xi- |_i 

otherwise, 

where 

f  i-1  i  ] 

Xx  =  min  ^  j  :  ^(j  -  l)xt  >  - 

l  ;=o  a  J 

and 

_  3  —  ~~  1  “  j)xJ 

C*x  “ 

Z^=o 


Theorem  7.  Assume  the  initial  condition  s(0)  satisfies  1  = 
si(0)  >  S2  (0)  >  •••  >  0  and  (ii)  |s(0)|  <  oo.  Starting  from 
s(0),  the  system  converges  to  the  equilibrium  point  sast-> 
oo,  where  ■  \  is  the  1-norm.  O 

Next  define  11^  (t)  to  be  number  of  servers  with  queue 
size  i  in  the  nth  system,  and  Tt\n\t)  =  to  be  the 

fraction  of  servers  with  queue  size  i  in  the  nth  system.  Here  we 
deliberately  reuse  notation  n  because  in  the  steady  state,  the 
fraction  of  servers  with  queue  size  i  is  equal  to  the  probability 
that  the  queue  size  of  a  server  is  i.  However,  note  that  here 
TT1'  n\t)  is  a  random  vector  instead  of  a  distribution.  Define  the 
vector  e  N°°  such  that  its  ith  component  r^(i)  = 

n(/l)  (f)  is  the  number  of  servers  whose  queue  lengths 

are  at  least  i,  7 00(f)  =  r  n  ^ ,  and  7  such  that  7*  = 
for  7r  defined  in  (4). 

The  following  theorem  states  that  7 ^n\t),  which  is  stochas¬ 
tic,  coincides  with  s(t)  for  any  bounded  time  interval  [0,  t] 
when  n  — >  00.  Here  we  define  U  to  be  the  space  of  all 
sequences  7  such  that 


These  differential  equations  are  derived  from  the  Markov 
chain  in  Figure  4.  View  xt  as  the  fraction  of  queues  with 
length  i.  Consider  x,  for  i  <  Xx  —  2.  According  to  Figure  4, 
x,  decreases  with  rate  x,:  x  (1  +  Ad)  because  the  queue  size 
of  a  server  with  size  i  becomes  i—  1  with  rate  1  and  becomes 
Xx  —  1  or  Xx  with  total  rate  Ad;  and  x,  increases  with  rate 
Xi+i  because  a  queue  with  size  i  +  1  becomes  a  queue  with 
size  i  with  rate  1.  Note  this  is  a  non-linear  system  because  ax 
and  Vx  depend  on  the  state  x. 

We  further  define 

OO 

si{t) =  'y  ]  xj(t) 
i=i 

for  i  >  0,  which  is  related  to  the  fraction  of  the  servers  with 
queue  size  >  i.  and 

OO 

Si  =  ^2  nj 

j—i 

for  7r  defined  in  (4).  Note  that  so(i)  =  1  for  any  t.  The 
differential  equations  of  the  non-linear  system  can  be  written 
in  terms  of  s (t)  as  follows: 

_ 

d  t 

Ad  (1  -{-  Adi) Si  +  1  f  Xs  1, 

i- 1 

<  A  Ad  ^  ^(1  Sj)  Si  -f-  1  —  Xs  (7) 

j—0 

—Si  +  s,;+i  otherwise, 

where 

(  i_1  1 
Xs  =  max  <  i :  ^^(1  —  Sj)  <  — 

{  3= 0 

The  following  theorem  establishes  the  equilibrium  point  and 
the  stability  of  this  non-linear  system.  The  proof  is  presented 
in  the  appendix. 


1  =  7o  >  7i  >  ’  ’  ’  >  0  (8) 


with  the  1-norm.  The  proof  is  presented  in  the  appendix. 

Theorem  8.  Suppose  that  7(ra)(0)  — >  s(0)  in  probability, 
where  s(0)  is  a  deterministic  initial  condition  such  that 
s(0)  >  0  and  |s(0)|  <  00.  Then  the  following  holds 

lim  sup  |7^(u)  —  s(u)|  =  0  in  probability.  0 

n->oo  o<M<t 

This  result  is  motivated  by  Kurtz’s  theorem  [7].  However, 
we  remark  that  I&1>  (t)  is  not  a  classical  density  dependent 
Markov  chain  because  q[n}  cannot  be  written  in  the  form  of 
npi  for  some  fii  independent  of  n,  and  7^)  is  an  infinite¬ 
dimensional  vector.  Therefore,  the  proof  of  Kurtz’s  theorem 
does  not  directly  apply.  Our  proof  is  a  non-trivial  extension  of 
Kurtz’s  theorem. 

We  also  remark  that  |s(0)|  =  JAia;j(0)  <  00  is  related  to 
the  average  queue  size  at  a  server,  so  the  condition  simply 
requires  the  average  queue  length  per  server  is  bounded 
initially. 

Theorem  7  and  Theorem  8  establish  the  following  result: 


7(")(t) 

/  .  \  t—foo 

s (i)  - >  7, 

(9) 

which  further  implies  that 

x(f)  - >  7 r. 

(10) 

A  direct  consequence  of  (10)  is  that  if  converges  to  some 
7r  or  a  subsequence  of  TT<r>l  converges  to  some  77  then  7r  =  tt. 
The  convergence  of  stationary  distributions  will  be  discussed 
in  the  next  section. 

V.  Convergence  of  the  Stationary  Distributions 

We  first  present  a  theorem  on  the  interchange  of  limits. 
The  theorem  is  similar  to  Theorem  5.1  in  [1].  However,  [1] 
assumes  the  state  space  of  each  system  is  finite  but  in  our 
system,  the  state  space  of  each  queue  is  the  set  of  nonnegative 
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integers.  While  the  proofs  are  similar,  we  present  it  here  for 
the  completeness  of  the  paper. 

Theorem  9.  Consider  a  sequence  of  random  processes  X':" 
indexed  by  a  scaling  parameter  n,  where  X^")  is  a  vector  that 
denotes  value  of  the  process  at  time  t,  and  a  dynamic  system 
X(f)  =  F(X).  Assume  X 1”)  and  X  satisfy  the  following 
assumptions: 

•  (Al)  Suppose  that  for  any  n, 

X(")(t)^xW,  (11) 

where  X^")  is  the  stationary  distribution  of  the  random 
process  and  — >  denotes  the  weak  convergence. 

•  (A2)  Suppose  for  each  finite  t, 

x(")(t)^X(i),  (12) 

when 

lim  X(n)(0)  =  X(0) 

n—¥  oo 

where  both  X^”)  (0)  and  X(0)  are  deterministic  initial 
conditions,  and  X(0)  £  X,  where  X  is  a  set  of  initial 
conditions. 

•  (A3)  Starting  from  each  initial  condition  X(0)  £  X , 
assume  that 

lim  X(t)  =  X.  (13) 

t—fO O 

•  (A4)  Any  subsequence  of  X^”)  has  a  subsubsequence 
that  weakly  converges.  The  limit  of  any  convergent  sub¬ 
sequence,  denoted  by  X,  satisfies  P  (X  £  X)  =  1  and 
its  support  is  separable. 

Then  X^)  AX.  o 

This  result  establishes  an  interchange  of  limits  because  from 
(Al)  and  (A2),  we  have 

lim  lim  X(n)(f)  =  lim  X(t)  =  X. 

t—fo o  n—fo o  t—fo o 

The  theorem  says  that  with  additional  assumptions,  we  further 
have 

lim  lim  X(n)(f)  =  X. 

n—fo o  t—fo o 

The  proof  is  presented  in  the  appendix. 

By  utilizing  the  result  above,  we  show  the  convergence  of 
the  stationary  distribution  in  the  following  theorem. 

Theorem  10. 


Proof.  Define 

X  =  {7  :  1  =  71  >  72  >  ■  ■  ■  >  0,  ^2  7*  <  00}, 

i 

which  is  separable  because  it  is  a  subspace  of  l1  =  { 7  : 
|7i|  <  00},  which  is  a  separable  metric  space. 

•  (Al)  holds  due  to  Theorem  1. 

•  Note  linin^oo  7*™)  (0)  =  s(0)  for  deterministic  initial 
conditions  7^"^(0)  and  s(0)  implies  that  7^(0)  — >  s(0) 
in  probability.  Therefore,  according  to  Theorem  8,  given 


deterministic  initial  conditions  7(^(0)  and  s(0)  such  that 
linin^oo  7(n)(0)  =  s(0),  we  have 

lim  sup  —  s(u)|  =  0  in  probability, 

n_>oo0  <u<t 

which  implies  weak  convergence. 

•  (A3)  is  established  in  Theorem  7. 

•  To  validate  (A4),  we  consider  the  space  U  which  is  the 
set  of  7  satisfying  (8)  and  with  the  following  norm  used 
in  [19] 

II 7  —  7,||  =  Sup  — —  • 

i>0  l 

~  (n) 

Under  this  norm,  space  U  is  compact.  Define  7]  = 

Tt'j"  '1  where  n^n'>  denotes  the  stationary  distribution 
of  7r(")(i).  By  Prokhorov’s  theorem  [2],  since  U  is  com¬ 
pact,  there  exists  a  subsubsequence  for  any  subsequence 
of  'y(n'1  that  weakly  converges  to  a  random  vector  7 
under  ||  •  ||.  which  is  denoted  by  'y(nk\  By  the  Skorohod 
representation  theorem,  there  exists  a  sequence  of  random 
vectors  with  the  same  distributions  that  converge  almost 
surely.  By  slight  abuse  of  the  notation,  we  assume  'y(rLk'> 
converges  to  7  almost  surely.  Since  0  <  7,  <  1.  by  the 
dominated  convergence  theorem,  we  have 

lim  —  7j|]  =  0  Vi.  (14) 

k—f  00 

Define 

k 

fkh)  = 

i=l 

It  is  easy  to  verify  that  /&(•)  is  a  continuous  and  bounded 
function  under  the  1-norm.  According  to  the  definition  of 
weak  convergence,  we  have 

E  Ei  7 i]  =  limfc-,00  E  [fk( 7)] 

=  lim^oo  lim^^oo  E  [/fe(7(nfc))]  <  c,  (15) 

where  the  last  inequality  is  due  to  Theorem  1,  which 
implies  that 

P(7g  X)  =  1. 


The  uniform  convergence  of  the  series 

OO 

Ee 


i—k 


4nk)  -  v- 

h  h 


(16) 


is  established  in  Appendix  F.  By  Tonelli’s  theorem. 


lim  E 

k—¥  00 


\¥nk)  -tI 


=  0, 


(17) 


which  implies  7I nk)  converges  weakly  to  7  in  1-norm. 
Therefore  (A4)  holds. 


□ 


Based  on  the  theorem  above,  we  further  have  the  following 
results  according  to  using  the  same  analysis  for  getting  (14) 
and  (17). 


Corollary  11. 


lim  E[7|n)]  =7 i  Mi, 
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iim  E  =  ^7i, 

_  i  J  i 

(18) 

lim  E  |7(n)  —  7I  =0. 

n—f  00  L 

(19) 

In  the  next  corollary,  we  show  that  any  k  queues  are 
independently  and  identically  distributed  with  distribution  7r 
in  the  large-system  limit,  where  k  is  a  constant  independent 
of  n.  Then  the  system  is  said  to  be  7r-chaotic  [16],  We  prove 
the  result  by  showing  that  the  unique  stationary  distribution 
of  k  queues  that  satisfies  the  detailed  balance  equations  in  the 
large-system  limit  has  a  product  form. 

Corollary  12.  Consider  a  set  of  k  servers,  and  without 
loss  of  generality,  assume  the  severs  are  1,2,  ■■■  ,  k.  Let 
n(n\Qi,  Q2,  ■  ■  ■  ,Qk )  denote  the  station  distribution  of  the 
queue  lengths  of  these  k  servers.  In  the  large-system  limit,  we 
have 

k 

lim  7r(n)(Qi,Q2,---  ,Qk)  =  TT^Q-’ 

n—fo o  -1- 

i= 1 

i.e.,  the  k  queues  are  independently  and  identically  distributed 
with  distribution  7V.  O 

VI.  Simulations 

In  this  section,  we  use  simulations  to  evaluate  the  perfor¬ 
mance  of  the  three  load  balancing  algorithms  in  large,  but 
finite-server,  systems. 

A.  Deterministic  Batch  Size 

We  first  considered  systems  with  n  =  10,000  servers,  batch 
size  m  -  100.  We  evaluated  the  per-task  and  per-job  delays 
of  the  three  algorithms  with  different  probe  ratios  d.  Figures  6 
and  7  show  the  per-task  delays  and  per-job  delays,  respectively, 
when  A  =  0.7.  Figures  8  and  9  show  the  per-task  delays  and 
per-job  delays,  respectively,  when  A  =  0.9. 

From  these  figures,  we  have  the  following  observations. 

•  In  terms  of  per-task  delays,  batch-filling  matches  the  power- 
of-two-choices  with  d  =  1.3  when  A  =  0.7  and  with  d  =  1.2 
when  A  =  0.9.  Batch-sampling,  on  the  other  hand,  requires 
d  =  1.6  when  A  =  0.7  and  d  =  1.7  when  A  =  0.9 
to  achieve  the  same  per-task  delay  as  the  power-of-two- 
choices.  Furthermore,  even  with  d  =  1,  the  per-task  delay 
of  batch-filling  is  only  slightly  larger  than  that  of  the  power- 
of-two-choices;  but  batch-sampling  has  much  larger  per-task 
delay  when  d  =  1  (10  versus  3  when  A  =  0.9).  Note  that 
the  per-job  delay  of  batch-sampling  with  d  =  1  has  been 
omitted  in  the  figure  for  readability  of  the  figure. 

•  Batch-filling  performs  even  better  in  terms  of  per-job  delays. 
As  we  can  see  from  Figures  7  and  9,  batch-filling  matches 
the  power-of-two-choices  even  with  d  =  1/  We  believe  this  is 
because  the  maximum  queue  size  of  batch-filling  is  smaller 
than  that  of  the  power-of-two-choices  when  d  =  1  even 
though  the  average  queue  size  is  larger.  Batch-sampling 
requires  larger  probe  ratios  to  match  the  per-job  delays  of  the 
power-of-two-choices.  This  is  because  the  maximum  queue 


size  of  batch-sampling  is  larger  than  that  of  batch-filling  as 

shown  in  Table  I. 

B.  Random  Batch  Size 

In  this  set  of  simulations,  we  evaluated  the  performance  of 
algorithms  under  random  batch  sizes.  We  assume  the  batch 
size  M  is  random  variable  such  that  with  probability  0.5,  M 
is  geometrically  distributed  with  mean  75;  and  with  probability 
0.5,  M  is  geometrically  distributed  with  mean  125.  The  other 
settings  are  the  same  as  those  used  with  fixed  batch  sizes.  The 
results  for  A  =  0.7  are  shown  in  Figures  10  and  Figure  11; 
and  the  results  for  A  =  0.9  are  shown  in  Figures  12  and  13. 
We  note  that  the  conclusions  of  our  previous  simulations  do 
not  change  with  these  modifications. 

VII.  Conclusions  and  extension 

In  this  paper,  we  proposed  a  new  load-balancing  algorithm, 
named  batch-filling,  which  uses  water-filling  to  attempt  to 
equalize  the  load  among  the  sampled  servers.  The  algorithm 
provides  a  much  lower  sample  complexity  than  the  power- 
of-two-choices  algorithm  for  the  same  delay  performance. 
Specifically,  it  only  needs  to  sample  slightly  more  than  one 
queue  per  task  to  match  the  per-job  delay  of  the  power-of- 
two-choices  algorithm. 

We  remark  that  the  theoretical  results  of  this  paper  can  be 
extended  to  random  batch  sizes.  Let  M<-n)  ( t )  denote  the  batch 
size  at  time  t  in  the  nth  system.  Assume  M^n\t)  are  i.i.d. 
across  time  t.  The  main  results  of  this  paper  hold  given  the 
sequence  of  random  variables  converge  in  distribution, 

are  uniformly  integrable,  and  M^n\f)  =  O(logn).  In  particu¬ 
lar,  Theorem  1  can  be  established  by  using  the  same  idea  that 
the  Lyapunov  drift  of  water-filling  is  dominated  by  random 
routing.  Lemma  2  also  holds  because  converge  in 

probability.  The  differential  equations  remain  the  same  under 
random  batch  size,  so  Theorem  7  is  still  valid.  Finally,  it  is 
easy  to  verify  that  Di/(dm)  converges  in  mean  as  m  — »  00, 
where  m  =  EjJIfW]. 
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Appendix  A 
Proof  of  Theorem  1 

We  ignore  the  superscript  (n)  of  (ff  (t)  as  we  will  focus 

on  the  nth  system.  Define  the  Lyapunov  function  to  be 

n 

nQW)  =  £<$(*)■ 

k= i 

Let  x,  y  €  Nra  denote  the  state  of  the  Markov  chains,  and  r/xy 

denote  the  transition  rate  from  state  x  to  state  y.  According 


to  the  Foster-Lyapunov  theorem  for  continuous-time  Markov 
chain  (see,  for  example.  Theorem  9.1.8  in  [15]),  we  consider 

E  9x,y  iV{  y)  -  Ux))  ■  (20) 

y/x 

Define  1  x  n  vector  such  that  e^[k]  =  1  and  ek[l]  =  0  for 
any  l  ^  k.  Then 

gx.x-efc  (v ((x  -  efc)+)  -  V (x))  <  -2 xk  +  1, 

which  corresponds  to  a  departure  at  server  k.  Next  define  rkx 
to  be  the  set  of  possible  states  of  the  Markov  chain  when  a 
batch  arrival  occurs  when  the  system  is  in  state  x,  then 

E  4x,y(V(y)-V(x))  <(o)  ^  (2^E^  +  m) 

ye'J'x  \  k  J 

=  2A  Xk  +  An, 

k 

The  inequality  (a)  can  be  established  by  comparing  batch¬ 
filling  with  the  load-balancing  policy  that  places  the  rn  tasks 
to  a  set  of  randomly  selected  m  servers,  one  for  each  server. 
Note  that  water-filling  is  the  optimal  solution  to  the  following 
problem: 

mina  Lfc=l  ( ak  +  Qk)2 

subject  to:  Y7k= i  &k  =  m 

afc  €  N  V  k. 

Therefore  qx  yV(y)  is  minimized  under  water-filling, 

conditioned  on  the  same  set  of  dm  sampled  queues,  and 
inequality  (a)  holds. 

Therefore,  we  have 

E  9x.y  (V (y)  -  V(x))  <  -  (2  -  2A)  E  n 

y^x  k 

Therefore,  the  Markov  chain  is  positive  recurrent  according  to 
the  Foster-Lyapunov  theorem.  Now  assume  the  system  is  in 
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the  steady  state,  then  we  have 


0  =E 


E  9x,y  (V"(y)  -  V(x)) 

yA* 


Now  we  assume  i  <  Qn.  In  this  case,  the  queue  size 
of  server  1  becomes  >  Q  (Q  >  i)  after  water  filling  with 
probability 


<-  (2-2A)E 


Yxk 

.  k 


E 


+  n  +  An, 


min  <  1, 


'm-(Q-l-i)~  J2f=  o  (Q  ~  1  -  j)xj 

■yQ—  1 


which  implies  that 


1  +  Ejf=0  X3 

Similar  to  the  analysis  above,  it  can  be  shown  that 

m  -  (Q  -  1  -  i)  -  E?=o2  (<5  -  1  -  j)xi 


E 


n  ' 


<^±E<  1 


'<3—2/ 

_ ij=0 

i  +  EjTo1^- 


(2-2A)  “  1- A' 


converges  to 


Therefore,  the  theorem  holds  by  choosing  c  =  1/(1  —  A). 

Appendix  B 
Proof  of  Lemma  2 

Without  loss  of  generality,  assume  server  1  has  queue  size 
i  and  has  been  probed.  Now  given  any  j  >  0,  define 


-E?=o2(Q-i-j> 


Ej=0  n3 

For  Q  >  Qn  +  1,  according  to  the  definition  of  Qn.  we 
have 


1 


<3,-1 


1  'y  'j  ( Q-ir  j)nj  —  0- 


dm—  1 


xj  —  E^ 


For  Q  =  <E 


fe=i 

which  is  the  number  of  probed  servers  with  queue  size  j 
without  including  server  1,  and  is  the  summation  of  dm  —  1 
i.i.d.  Bernoulli  random  variables  with  mean  ttj.  We  further 
define  fj,j  =  E[Xj\  =  (dm  —  l)7Tj. 

Consider  any  i  such  that  i  >  Qn .  The  probability  that  server 
1  receives  a  task  in  water  filling  is  upper  bounded  by 


3=0 


|-E?=v2(0.-  i-j)7 


EE 


Q"  1  ■ 

3=0 


For  Q  <  Q,  -  1, 


3-E?=o2(Q-i-j> 


_E= 0  *3 

■‘3=0 


E 


<  E 


=  E 


'™-EJ= o(*-i)E\  + 


1  +  Ej'=0  X3 


> 


a  ~  E?=0  3(Qtt  2  j ) 7 


EE 


*5”  2  Tj-  - 

2=0  A? 


E?=0  (Qw  J)Xj 


ij= 0 


>- 


EjQ=0  "(<2-  -  1  -  j)7T,  -  E.Eo  3(<3-  -  2  -  j) 


<3,-3/ 

2=0 


dm—  1 


1+E?=0^2  /  _ 

E  —  E^=o  —  JOaaEi 


y^,-2 

2^j=o  n3 


+1 


dm 


1 _ I  Ei 

i—l  '  Z-jj=0 


Jj=0  dm—  1 


=i. 


(21)  Therefore,  for  any  z  <  Q,  and  i  ^  j,  we  have 


\da-j. 


if  j  =  Q-n 


which  converges  to 


Qi  j  —  \  AcZ(l  c^7r) i  if  j  Qtt  1 


(24) 


(25) 


E?=o  *3 


(22) 


0, 

Hence,  the  lemma  holds. 


otherwise. 


as  m  — >  oo  because  Xj/(dm  —  1)  converges  to  tTj-  in 
distribution  and  the  term  inside  the  expectation  is  bounded 
and  continuous  in  terms  of  Xj/(dm  —  1).  According  to  the 
definition  of  Qn  (3),  we  know  that 


Appendix  C 
Proof  of  Theorem  7 

Motivated  by  the  proof  in  [10],  we  consider  the  following 
Lyapunov  function 


x  <3,-1 

—j  ~  y (Q-n  —  j)n3  ~ 
2=0 


v(t)  =  EI^)-^|. 


Define  et  =  s,  —  ,sl,  so  the  Lyapunov  function  can  be  written 


so  (21)  — >  0  and, 

Qi_j  =0  i>  Qn  and  j  £  {i,i—  1}.  (23) 


i=l 
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We  will  analyze  the  upper  right-hand  derivative 

dV{t)  =  lim  V[t']  ~  V{t) 


df 


*'—»■*+  t'  —  t 


in  three  different  cases. 

•  In  the  first  case,  consider  s  such  that  Xs  =  Qbf-  In  this 
case,  the  differential  equations  can  be  written  in  terms  of 
e  in  the  following  form: 


~ (1  +  A d)ei  +  e$+ 1  i  <  Qbf  ~  1> 

— —  =  *  A d  ^  ^  €j  €i  -f-  |_i ,  i  Qbf 

3=0 

+  ei+ 1  otherwise. 

Now  for  i  <  Qbf  —  1, 

=  —  (1  +  A d)ei  +  ej+i  if  e.j  >  0, 


(26) 


dN 

df 


—  (1  +  A d)ci  —  if  ei  <  0, 

—  II  —  0- 


which  implies  that 

d|e,; 


df 


<  — (1  +  Ad)|ej|  +  |ej+i|  i  <  Qbf  —  1- 


Similarly,  we  can  obtain  that 


djfij  <  |-|e»|  +  AdE}=i  leil  +  k»+i|  if  *  =  Qbf, 
df  [  —  |e*|  +  |ej+i|  if  *  >  Qbf- 

Combining  the  results  above  and  the  fact  that  s,;(f)  — >•  0 
as  i  — >  oo  for  any  /.  we  conclude  in  this  case, 

dV(t)  _  dlet|  <  _|£^| 


df 


i— 1 


df 


•  In  the  second  case,  consider  s  such  that  Xs  >  Qbf-  Then, 
similar  to  the  analysis  of  the  first  case,  we  have 

de,- 


df 


<  —(1  +  Ad)|e,;|  +  |ej_|_i|  V  i  <  Qbf  ~  1-  (27) 


We  next  consider  two  subcases. 

-  In  the  first  subcase,  sqbf  >  sqbf.  Note  that  s,  =  0 
for  any  i  >  Qbf,  so  we  have 

d|e» 


E 


i—Qh 


d  t 


-I  '-'S -I  -I 

Ed  €i  dSi 

~dt  =  ^ 


d  t 


Qbf  —  1 

=  A-A d 

3=0 
Qbf  —  1 

=  Xd  E  ei~eQBF 

3= 0 

Qbf~  1 

A d  ^3  I €j\- 

3=0 


^  ~\cQbf 


(28) 


Combining  (27)  and  (28),  we  obtain 
d  V(t) 


d  t 


<  — lel  I  -  \£Qbf  I  ^  °- 


-  In  the  second  subcase,  Sr>  <  §r>  ■  In  this  case 

VBF  VBF 

dkil 


E 


i=QsF  + 1 


d  t 


OO  ,  OO  , 

Ede^  ds^ 

~di  =  .  ^  ~~d t 

*=Qbf  +  1 

Qbf 

=  A  -  Ad  53  (1  -  Sj)  -  sgBF+1 

3=0 

Qbf 

=  A  -  Ad  53  (f  -  %) 
i=o 
Qbf 

+  Ad  E  ei  ~  eQBF+i 

3=0 

Qbf 

<Ad53  ki|-|cQBF+1|,  (29) 

3=0 

where  the  last  inequality  holds  due  to  the  definition  of 

Qbf  and  the  fact  that  egBF+1(f)  =  sgBF+1(f)  >  0 

for  any  f. 

Next,  given  sgBF  <  sQbf,  we  have 

dl£<5sF  I 


df 

ds/ 


df 

=  -  Ad  +  (1  +  A cL)sqbf  -  sqbf+1 
=  -  Ad  +  (1  +  A d)sQBF 

+  (1  +  A d)eQBF  -  £qbf+ i 

<  -  (1  +  Ad)|egBF|  +  |egBF+1|,  (30) 

where  the  last  inequality  holds  because  egBF  <  0,  and 

—  Ad  +  (1  +  A  d)sQBF 

=  -  Ad  +  (1  +  Ad)  (l  -  (1  -  A)(l  +  A d)QBF-^ 
=1-  (1- A)(l  +  A d)QBF 
<l-d-A)r4=0. 

Combining  inequalities  (27),  (29)  and  (30),  we  obtain 
d  V{t) 


d  t 


<-ki|<o. 


•  In  the  third  case,  consider  s  such  that  Xs  <  Qbf-  hi  this 
case,  we  first  have 

t  t  ^  =  -^,,+>1.  (3D 


df  4-^  df 

*-Qbf+1  *-Qbf+1 


and 


de. 
d  t 


-  <  ~ (1  +  Ad) | Cf |  +  |e*+i|  V  i  <  Xs.  (32) 


We  next  further  consider  the  following  subcases. 
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Assume  Sx  <  Sx  >  so 


dlcA. 

d£ 


xs-i 


—  —A  +  A d  ^  (1  —  Sj)  +  sXs  —  Sa's+i 


3=0 


Note  that  Sj  —  s,;+i  =  Ad(l  —  Sj)  for  any  i  <  Qbf,  so 

-%l  =  -A+Adf;(i-jj) 

3=0 

As-1 

-  Ad  ^  Cj  +  ejfa  -  £xs+i 
3=0 

Xs 

<  -  A  +  Ad^(l  -  Sj) 

3=0 

As  — 1 

+  Xd  ^2  leil  —  leAsl  +  leAs+il-  (33) 

3=0 

Next  for  Xs  <  i  <  Qbf ,  we  have 
de,; 


where  inequality  (a)  holds  due  to  the  definition  of  Xa, 
and  the  last  inequality  holds  because 

Ad  —  (1  +  A  d)§xB  +  s.ys+i  =  0 
when  Xs  <  Qbf- 

The  summation  of  (34)  and  (35)  yields  that 


Ql 


E 

i — .Xg  +  l 


dM 

d  t 


xs 


<A  -  Ad^(l  -  §j)  -  |e.Ya+i|  +  | cq£ 


?+i 


3=0 

As 


<A-  Ad^(l  -  Sj) 

3=0 

As 

-  Ad^  €j  -  |e_Ys+i|  +  |£qbf+1| 


3=0 


As 


<AdE  le3'l  ~  leA's  +  ll 
3=0 


tQBF+ 1  I  ’ 


d  t 


—  ~  Si  Si+i 

=  —  Si  +  Sj+i  —  e,  +  e,;+ 1 

=Ad  —  A  dsi  —  Ci  +  £j+i, 


(37) 

where  the  last  inequality  holds  due  to  the  definition  of 
Xs.  The  summation  of  (31),  (32),  (36)  and  (37)  yields 

d  V(t) 


d  f 


<  -|ei  <  0. 


which  implies  that 

d|e,; 


In  a  summary,  we  have  shown  that 

dV(f)  (  <  0,  if  s(t)  7^  s 


df 


<  Ad(l  -  Si)  -  |e,|  +  |ei+i|  VXs<i<  Qbf- 

(34) 


df 


=  0,  otherise. 


(38) 


For  i  =  Qbf ,  we  have 


de, 


Qi 


df 


sQbf  3”  SQbf+1 

SQbf  3”  ^Qbf  +  I  €Qbf  3”  ^Qbf  +  I 
Qbf  1 

=A-Ad  (1  -  Sj)  -  €qbf  +  eQBF+i, 

3=0 


which  implies  that 

dko 


d  t 


Qbf  —  1 

<  A  -  Ad  ^  (1  -  Sj)  -  | cqe 

3=0 


I£Qf 


M-l 


Next  define  i*  =  min{i  :  e,  <  0}.  If  such  an  i*  exists, 
since  Sj  =  0  for  any  i  >  Qbf,  **  <  Qbf-  Furthermore,  if 
Xs  <  Qbf,  then  i*  <  Xs.  It  is  easy  to  verify  that  when  i* 
exists, 

d  |  Ci* — 1 1  de,;*  — i  .  ..  .  .  . 

— dT“  =  ~dT  =  ~(1  +  Ad)lE‘--1|-le‘-l' 

Since  the  following  bound  has  been  used  throughout  in  the 
proof 

— -  ^  ^  <  —  (1  +  Ad)|e,*_i|  +  |ej*  |, 

when  i*  exits,  we  can  further  obtain 

|.  dV{t) 

df 


<  —  |ei|  -  M> 


(39) 


d V(t)  „ 
df 

Assume  sx  >  sx  ,  then 
As— l 


(35)  which  implies 

(35),  we 

0, 

if 

s(f)  = 

=  s. 

dV(t) 

< 

0, 

if 

Si{t) 

<  Sj  for  some  i 

-N<o. 

d  t 

< 

0, 

if 

Si(f) 

>  h 

< 

\  — 

0, 

if 

s,(t) 

>  Sj  Vi  and  si(f) 

(40) 


dleAs 

d  t 


—A  —  Ad  ^  (1  —  Sj)  —  sXb  +  sXs 


+i 


3=0 

As 


The  result  above  shows  that  |s(f)  —  s|  is  non-increasing. 
For  any  x  such  that  |x|  <  oo,  we  define 

Sx  =  {y-  \yn\  <  \xn\  for  all  n} . 


Then  we  can  see  that  Sx  is  compact  since  we  can  approximate 
—A  Ady^(l  Sj )  +  Ad(l  sXb)  sAa  +  sA|^y.  tail  with  e/2  and  the  first  finitely-many  elements  are 

1,-0  in  an  equivalent  Euclidean  space  and  hence  the  the  finite- 

<(a)Ad  —  (1  +  A  d)sXB  +  s^s+1 


<  —  (1  +  Ad)|exJ  +  |e_Ya+i| 


dimensional  part  is  totally  bounded  with  the  remaining  e/2 
(36)  as  well. 
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Since  |s(f)  —  s|  is  non-increasing,  given  a  fixed  r  >  0  and 
initial  condition  s(0)  £  B(s,r),  where 


we  have 


B(s,r)  =  {s£X:  ||s  -  s||  <  r}  , 


|s(f)|  <  r  +  |s|  V  t. 


N(  —  Xt ),  i.e.,  a  Poisson  random  variable  with  mean  —At. 
Define  event  Bn.a  to  be 

r  ft 

Bn, a  =  \  Bn(t)  <(14-  a)— At 

L  m 


and  X^-"^0)  -  (1+a)XSi(°)}  ■ 

i  i  ) 


Since  s\(t)  >  S2(t)  >  ...  >,  there  exists  N(r)  such  that  for 
any  i  >  N(r)  and  any  t  >  0, 

Si(t )  < 

s*+i(t)  <  0. 

Now  consider  any  initial  state  s(0)  £  X.  Let  r  =  ||s(0)  —  s|| 
and 

,  j  1  if  z  <  N(r), 

*  |^Si(0)  if  i  >  N(r). 

Then  s'  £  X.  Let  ft  =  B(s,r)  IT  Ss>.  Since  both  B(s,r)  and 
Ss’  are  closed  and  Ss>  is  compact,  we  have  that  ft  is  compact. 

Also  note  that  for  any  initial  state  s(0)  £  ft  we  have  s(t)  £  ft 

as  well,  so  ft  is  positive  invariant  and  compact. 

Furthermore,  given  s(f)  such  that  si  (t)  =  Si  and  s,(f)  > 
Si(i  >  2)  ,  it  can  be  easily  shown  that  S\(t  +  St)  >  Si  for  a 
sufficiently  small  St  unless  s(t)  =  s.  The  result  can  be  proved 
by  following  the  idea  of  LaSalle’s  invariance  principle  [8]. 

Appendix  D 
Proof  of  Theorem  8 

Recall  the  definition  of  L[(n)  (t)  £  N°°  where  the  zth  com¬ 
ponent  rQn)(f)  is  the  number  of  servers  whose  queue  lengths 
are  equal  to  i.  Since  \V"Ht)  can  be  uniquely  determined  by 
r(")  it)  and  vice  versa,  and  Il(n)  (t)  is  a  Markov  chain,  r(") (t) 
is  a  Markov  chain  and  we  have 

r<n>(t)  =  r(")(0)  +  X  LiVL  (  f  R^\T^(u))  d u)  , 

L6N“  ^ 0  ' 

(41) 


Applying  the  Chernoff  bound,  we  obtain 

P  (Bnit)  <  (1  +  a)  — At)  >  1  -  e-^xth^\ 
where  h(a)  =  (1  +  a)  log(l  +  a)  —  a.  Also 

lim  P  (  ^7|n)(0)  <  (l  +  a)^si(0) )  =1 

\  i  i  / 

because  7(Id(0)  converges  to  s(0)  in  probability  according  to 
the  assumption  of  the  theorem.  Thus,  we  have 

lim  P  (Bn  a)  =  1. 

n—>oo 

Note  that  n  J7  /y^  (u)  is  the  total  number  of  tasks  in  the 

system  at  time  u.  When  Bn,a  occurs, 

max^  ^7J(")(u)  <  (1  +  a)  ^Af  +  X  s*(0) j  . 

Define  Ca  =  (1  +  a) (A t  +  J7  Sj(0)).  When  the  inequality 
above  holds,  we  have 

OO  /7 

7 in\u)  =  V  0  <  «  <  t,  Vi,  (42) 


which  further  implies  that  for  k  = 


KM) 


we  have 


where  Nj_,  (x)  are  independent  standard  Poisson  processes  and 
is  the  transition  rate  of  the  Markov  chain  from  state 
r  to  state  r  +  L.  For  example,  given 

L  =  (0,  —1, 0,  •  •  •  ,y, 

which  corresponds  to  the  event  that  there  is  a  departure  from 
a  server  with  queue  size  1, 

ijW(r("))  =  r^n)  -r^n) 

because  there  are  Tj"-*  —  T<:2n>  servers  with  queue  size  1. 
Dividing  by  n  on  both  sides  of  equation  (41),  we  get 

7 M(t)  =  7W(0)+  V  -iVL  (  f  R^\n^n\u))  du 

LeN°°  n 

Now  define  Bn(t)  to  be  the  total  number  of  batch  arrivals 
within  time  interval  [0,  t]  in  the  nth  system.  Then  Bn(t)  = 


')\n\u)  <  1  ^1  —  ^  V0  <  u  <t,  Vz  >  k.  (43) 

Next  we  define  the  following  four  sets: 

•  Tn  :  the  set  of  L  such  that  L  >  0,  which  is  the  set  of  L 
related  to  arrivals, 

•  £+  :  the  set  of  L  such  that  L  >  0  and  L,  =  0  for 

i  >  k  +  1. 

•  T~  :  the  set  of  L  such  that  L  <  0,  which  is  the  set  of  L 
related  to  departures. 

•  C~  :  the  set  of  L  <  0  and  L,  =  0  for  z  >  m. 

We  further  define  Nz,(a)  =  Nj_,(a)  —  a,  which  is  a  centered 
Poisson  process.  Then  we  have 

7 {n)(t) 

=  7(ra)  (0)  + 

X!  \NL  (J  Rj? )(n7(n)(zz})  duj  + 

LG(T„+ur„_)\(£iuz:“)  ^  0  ' 

X  (/  Rl] (nj(n) (u))  du)  + 

-lgc+uc-  V  0  y 

X  ~  [  An)  (n7("}  («))  du. 
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Define  s(f)  to  be  the  solution  of  the  differential  equations  |7*'"-*(u)  —  s(u)|  <  4i5eM“  for  any  u  €  [0,t].  Thus 
(7)  with  initial  condition  s(0),  and  F(s)  such  that  the  nonlinear 
differential  equations  in  (7)  are  given  by 

ds  P  (  sup 

=  F(s)-  Vo<«<t 

Following  the  idea  behind  the  proof  of  Kurtz’s  theorem  (see 
[5]  for  an  easy  exposition),  we  have 


7 (ra)(u)  -  s(u)  <  4 SeMt^j  >  P (B.n)  1 


< 


sup 

0<ii<£ 

„(") 


—  S  (u) 


as  n  — >  oo. 

(44)  Lemma  13. 


7W(0)  -s(0) 


(45) 


sup 

0<u<£ 


sup 

0<n<£ 


sup 


sup 

0<u<£ 


E  R£\n7W(T))dT^ 


L£(£  +  U£“) 


(46) 


E  ^^£4"WnE))dr) 


Le£;u£ 


(47) 


E  L  [U  dr 

,  ^  Jo 

Lec+uc- 

PlL 

-  F(7(n)(r))dr 

PU  pu 

/  F(7^(t))  dr  —  /  F(s(r))  dr 

Jo  Jo 


(48) 

(49) 


P  ((46)  >S)<  +  2(1  -  P  (Bn  Q)). 

o  dm 


Proof.  Note  that  L  £  TJ  \  CAf  occurs  when  a  task  is 
dispatched  to  a  queue  with  size  at  least  k.  Under  condition 
(43),  when  a  batch  arrival  occurs. 


(J  {my  ->•  ny  +  L} 
vl &rf\c+ 

<  P  (dm  —  Zk  <  m)  =  P  (Zk  >  (d  —  l)ro) 

<  g  2(d+l)m 


According  to  Lemmas  13-15,  we  obtain  that  there  exists  n 
such  that  for  any  n>n, 


(  sup  ( 

7  (n\u)  —  s  (u) 

\0 <u<t  v 

F(7(n)(r))  -  F(s(t))  dr  ^  >  4^ 
<p(|7(n)(0)-s(0)|  >  j)  +  3(1 -P  (£„,„)) 

=  { 


+  AmK  max  \  e 


—  ^  Xth( 


(m  +  l)^ Xt\6  nth(2^t 


At  (d~1)2  m  A tCa 
<5  8m  ' 

which  converges  to  zero  as  n  — >  oo  since  m  =  0(log?r). 
Let 


Bn  = 


sup  [  7^(u)-s(u) 
o <u<t 

ru 

/  F(7(n)(r)) -F(s(r))dr 


<  4  S 


Then  P(2?n)  — >  1  as  n  — >  oo.  When  Bn  occurs,  for  any 
u  €  [0,  t], 


7  ^n\u)  —  s  (u) 


<  A5  + 


PU 

/  F(7(n)(r))  -F(s(r))dr 


/*u  I 

<  A5  +  M  /  7^(t)  —  s(r) 


dr, 


Jo  1  1 

where  the  last  inequality  holds  because  F(s)  is  Lipschitz 
as  shown  in  Lemma  16.  By  Gronwall’s  inequality  we  have 


where  Zk  is  the  number  of  servers  probed  with  queue  size  at 
least  k  and  the  last  inequality  is  obtained  from  the  Hoeffding’s 
inequality  for  sampling  without  replacement.  Therefore,  we 
have 


£  ^ 

LgT„+\£^  n 


(jo 


+  1  -  P^ra.a) 


<P 


n  ,  ;<#  if 

—  Ate  2(<i+1) 
m 


At  _L±zl)_ 

- g  2(d+l) 


+  1  —  P  (Bnta), 


+  1  —  P(£>ra,a) 


where  inequality  (a)  holds  because  N(t)  is  nondecreasing 
with  t  and  the  last  inequality  is  obtained  from  the  Markov 
inequality. 
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Similarly,  we  can  also  obtain 


sup 

0  <u<£ 


<P  (  -N 
n 


E  ^P(Tr>'(n)(r))  dT) 


LGT„“\£ 
ft 


10 


E  dr  )  >  5 

l eT~\c~ 


<  P  |  B(n,  a)  p|  ^  ^  W"}  W)  dr  j  >  <sj  which  j 


We  divide  the  analysis  into  the  following  cases: 

•  For  i  >  m,  Li  =  0  for  any  L  £  Cf  UjC~,  which  implies 

>  6  |  p"}(7)=0and 

E  |jFi(")('T)  -  -Pi  (7 )  =  E  Fi  W  =  7m+i  - 

i>m  i>m 

For  m>i>  k, 

P"}(  7)  =  — 7i  +  7i+l) 

implies  that 


+  1  —  P(SnjCl) 

<  P  (  -AT  (  nXt—)  >  6  )  +  1  -  P (Bn  a ) 
\n  \  m  /  1 

<^  +  l-P  (Bn>a). 

dm 


□ 


Lemma  14. 


1  ((47)  >  d)  <  4mfe  max  je 


-  A )  -nth{ 


Proof.  Note  that  |£+  U  £n  |  <  mk  +  m  <  2 mk .  For  L  £  £+, 


P  (  sup 

VO  <ii<£ 


n 


-Wl  (  /  f?Ln)(n7(n)(T))  dr 


> 


2mfc 


m  ,  m 

< P [  sup  — 

VO <u<t  Tl 


<2e~^xth<'^£t\ 


JVL  - 


(  —  Am) 
Vm  / 


> 


2mfc 


where  the  last  inequality  follows  from  Proposition  5.2  in  [5], 
Similarly,  for  L  G  £~, 

S 


P  [  sup 

V0<u<£ 


^iVL  -Rl  )(n7(”)(r))  dr) 


> 


2  mk 


1 

<P  (  sup  -  |iVL  (rat)  I  > 


0<u<£  ^ 
—nth(  sk  ) 


2mk 


<2e 

Combining  the  results  above  and  using  the  union  bound,  we 
obtain 

P((47)  >5)  <4mfemaxje  mAt,Am+:i)*Ape_nth(i^5i)| . 


□ 


Lemma  15.  There  exists  n  such  that  for  any  n  >  n, 

P  ((48)  >  <5)  <  1  —  P  (Bn,a)  • 

Proof.  To  study  (48)  under  condition  (42),  we  define 

F(n)(7)  =  ^  E  L4"W 

Le£+U£“ 

and  consider 

|f(”) (7)  -  F(7)|  =  £  |pn)(7)  -  Fi( 7) 


For  k  >  i, 


Pn)(7)-F,(7) 


1  /  An 


=  0. 


-pf  (7)  =  -  —  S[A|7]  -  nji  +  n7i+i 


n  \  to 


=  AdE 


-A 

dm 


7 


-  7»  +  7*+i> 


'  D,; 

=  AdE 

dm 

7 

where  /A,  is  a  random  variable  denoting  the  change  in  the 
number  of  servers  with  queue  size  at  least  i  after  water 
filling.  Therefore, 

*f}(7)-A(7) 

Recall  Zi  to  be  the  number  of  probed  servers  with  queue 
size  at  least  i,  so  Dt  is  a  function  of  Zj  ( j  <  i )■ 
Specifically, 

f  (  i_1 

Di  =  min  <  dm  —  Zi,  to  —  ^^( dm  —  Zj 

I  V  3=0 


(51) 


Therefore, 


A: 


dm 


dm  —  Zi  II 
dm  ’  1  d 


-U'- 


j=0 


ff}_ 

dm 


Applying  the  Hoeffding’s  inequality  for  sampling  without 
replacement,  we  have  that 

P  (|  Zi  —  7,:d?n|  >  \J  m  log  to  J  <  2e_2—A  = 

which  implies  that 


(| Zi  —  7idm|  <  sj to  log  m  Vi  <  k'j 


>  1  - 


TO2 /d  ’ 

2  k 


TO 


2/d  ' 


Given  |Zj  —  7idm|  <  y7 to  log  to  for  all  i  <  k,  we  can 

obtain 


j 

r  , 

E 

'  -Di 
dm. 

7 

—  min  < 

l  ' 

i-1 


J=0 


< 


/c-ydogTO 


(50) 


By  summarizing  the  cases  above,  we  obtain  that  under  condi¬ 
tion  (42) 

Ca  ky/l Og  TO 


FW(7)-F(7) 


< 


?n  dyTn 
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Therefore,  given  <5,  there  exists  mg  such  that  for  any  m  >  mg,  If  hx  (s)  >  hxs(s') ,  then 


sup 

0<n<£ 


r>U  (>U 

/  F(n)(7(n)(r))  dr-  /  F(7(n)(r))dr 
Jo  Jo 


Xa. 


to  +  £  A 


to  dy/m 

So  for  sufficient  large  n. 

P  (  sup  /  (7^(1-))  dr  —  f  F(7^(r))  dr 

\0<u<t  Jo  Jo 

<1-P 


>  <5 


□ 


Lemma  16.  F(s)  Tv  Lipschitz. 

Proof  Consider  s,  s'  £  N°°.  Without  loss  of  generality  Xs  < 
Xs/.  Define 

hi{ s)  =  *i(s)  -  s*  +  s*+i. 


Then 


|F(s)-F(s,)| 

OO 

=  ^|Fi(s)-Fi(s') 


i=  1 

OO 


<J2  (1st  -  si I  +  |s*+i  -  Si+rl  +  \hi(s)  -  hi{ s')|) 

i= 1 

OO 

<2|s  -  s'|  +  53  \hi(s)  -  hi( s')|. 

*=1 

Recall  that  F)(s)  =  — s.j  +  Sj+i  for  i  >  Xs  and  Ffs)  = 
Xd  —  (1  +  Xd)si  +  Sj+i  for  i  <  Xs,  so 


|F(s)  -  F(s') 


•W  1 


A„, 


<2|s  -  s'|  +  A d  53  Is*  -  s' |  +  53  IMS)  -  Ms0 


i— 1 


i=XB 


We  next  consider  two  cases.  If  hx  ( s)  <  hxB(s'),  then 


x., 


A'  -1 


51  \hi(s)  -  hi(s')\ 

i=Xs 

=Xd  —  Xds'xB  —  X  +  Xd  53  (1  —  sj) 

3= 1 

X-s'  —  1 

+  Ad  ^2  (1  —  s') 

i—X  s  +  1 
Xsf—1 

+  A  —  A  d  5 2  (1  —  s'  ) 


3= i 


A.-l 


=xd  53  ( s'j  - : 

3= 1 
As  — 1 

<Ad  ^  |s'  -  . 
i=i 

<Ad|s  —  s'|. 


5Z  lfti(S)  “  /l»(S')l 

i= As 

As 

=  —  Ad  T  Xds x  V  Ad  —  A dsxs  T  A  —  Ad  ^  )(1  —  Sj) 

3= 1 

As  As 

+  A  -  Ad^(l  -  s')  -  A  +  Ad 53(1  -  Sj) 
j=i  i=i 

As 

+  A  -  Ad 53(1  -  Sj) 

3= 1 

As 

<Ad|s^  —  s  yj  +  Ad  53  I  Sj  ~  Sj| 

i-t 

<2Ad|s  —  s'  | , 

where  the  first  inequality  holds  because 

As 

A  —  Ad 53(1  —  Sj)  <  0 
j= i 

according  to  the  definition  of  Xs. 

Combining  the  results  above,  we  obtain  that 

|F(s)  —  F(s')|  <  (2  +  3Ad)|s  —  s'|. 

Therefore,  the  lemma  holds. 


□ 


Appendix  E 
Proof  of  Theorem  9 

Let  X,rlh>  denote  the  weak  convergence  subsequence  in 
assumption  (A4).  By  (Al)  and  the  Skorohod  representation 
theorem,  there  exists  {xA"fc)}  and  X  such  that 
.  X^  =d  X("fc), 

•  X=d  X,  and 

•  X("C  converges  to  X  almost  surely. 

Now  let  X(n*A(0)  =  XAn,A,  i.e.,  the  ?rfcth  system  starts  at  a 
random  initial  condition  specified  by  its  stationary  distribution, 
which  implies  that 

X(nfc)(f)  =d  X(nfe)  Vi. 

Denote  by  X(f)  the  random  state  of  the  dynamical  system 
starting  from  the  random  initial  condition  X.  According  to 
(A2),  for  any  deterministic  initial  condition  in  X , 

x<nfe)(f)  A  x(f). 

By  the  definition  of  weak  convergence,  for  a  bounded  contin¬ 
uous  function  /, 


lim  E 

n—f  oo 


=  E 


/(X(”fe)(f))  |X("fc)(0)  =  X("fe) 
/(X(i))|X(”fc)(0)  =x 


Since  /  is  bounded,  further  by  the  bounded  convergence 
theorem  and  the  fact  that  P  ^X  £  X^j  =  P  (X  £  X)  =  1, 
we  have 


lim  E 

n—¥  oo 


/(X^)(i))l  =  E  [/(X(i))] , 
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which  implies  that  X^+i)  converges  weakly  to  X(6)  for  any 
t. 

Since  X^+i)  =d  X(nfc)  Vi,  we  further  have  X(f)  =<j 
X  Vf .  Now  according  to  (A3),  the  dynamical  system  converges 
to  X  starting  from  any  initial  condition  in  X ,  which  implies 
X(t)  converges  to  X  almost  surely  and  also  implies  that  X(f) 
converges  weakly  to  X.  Therefore,  X  is  a  point  mass  at  X, 
which  implies  that  X+fc)  converges  weakly  to  X.  Since  this 
holds  for  any  convergent  subsequence,  the  theorem  holds. 


Appendix  F 

Uniform  Convergence  of  the  Series  (16) 

First  since  E  [7i]  <  c  according  to  (15)  and  E  +]  >  0 
for  all  i ,  the  sequence  sb  =  Xw= 1  E  [Vi]  is  bounded  above  and 
increasing,  so  s  =  sb  exits.  Therefore,  given  any  e, 

there  exists  be  such  that  for  any  6  >  6e , 


E  E  [7i]  =  s  -  sb  <  e. 
*=6+1 

Next  we  establish  an  upper  bound  on 


E  E 

*=6+1 


>  (fife) 


(52) 


using  the  Lyapunov-drift  analysis  at  the  steady  state.  Since 


OO 

Ank) 
H 
i= 1 


<  c 


for  any  nk  and  E 


E 


is  decreasing, 


c 

<  - 
i 


for  any  i  and  any  nk .  According  to  Markov’s  inequality,  we 
have 

c  2d 


>  d _ 1 

h  ~  2d 


< 


id  —  1 


(53) 


Now  consider  the  jr^th  system  and  define  a  Lyapunov 
function  to  be 

rik 

vm))=ym(t)~b+i)+f, 

j=i 


where  b  >  0  and  the  superscript  (r+  of  Q  is  ignored  to 
simplify  the  notation.  Let  x,  y  G  denote  the  state  of 
the  Markov  chains,  and  r/x  y  denote  the  transition  rate  from 
state  x  to  state  y.  According  to  the  Foster-Lyapunov  theorem 
for  continuous-time  Markov  chain  (see,  for  example.  Theorem 
9.1.8  in  [15]),  we  consider 

E?x,y+y)-F(x)).  (54) 

yVx 

Recall  that  is  a  1  x  rife  vector  such  that  e+]  =  1  and 
ej  M  =  0  f°r  any  ^  7^  3-  Then 


*c,x-e3-  (U((x-e_,)+)  -U(x)) 


- 

<  -  2 (xj  -  b)+, 


if  Xj  <6—1 
if  Xj  =  b 
if  Xj  >  b 


which  corresponds  to  a  departure  at  server  j. 

Next  define  Tx  to  be  the  set  of  possible  states  of  the  Markov 
chain  when  a  batch  arrival  occurs  and  the  system  is  in  state 
x.  Consider 

E  fe,y(^(y)-Vr(x)). 
yefx 

We  note  that  batch-filling  is  one  of  the  optimal  solutions  to 
the  following  problem: 

mina  J2k=i  (  (ak  +xk  -b  +  1)+) 
subject  to:  Sfc=i  ak  =  m 

Ofc  G  N  V  k, 

where  {xk}k=  l,---  ,dm  are  the  sizes  of  the  probed  dm  queues. 
In  other  words,  given  x  and  the  set  of  dm  probed  servers,  the 
batch-filling  minimize  U(y).  This  can  be  proved  by  showing 
that  any  task  assignment  can  be  modified  to  the  batch-filling 
solution,  by  iteratively  moving  new  tasks  from  large  queues 
to  small  queues,  without  increasing  the  value  of  the  objective 
function. 

Given  any  b  >  2,  we  consider  the  following  two  cases. 

•  First  consider  x  such  that 

x  G  Clb  :=  lx  :  y^IXj<b-2  >  nk 

In  other  words,  at  least  (d+l)/2d  fraction  of  servers  with 
queue  size  at  most  6  —  2. 

Define  \l/x  to  be  the  set  of  possible  states  of  the  Markov 
chain  under  batch-sampling  when  a  batch  arrival  occurs 
and  the  system  is  in  state  x,  and  gx  y  to  be  the  correspond¬ 
ing  transition  rate.  Since  batch-filling  minimizes  V (y)  for 
any  given  set  of  probed  server,  we  have 

E  9x,y  (V(y)  -  U(x))  <  y  qx,y(V( y)-U(x)). 

yG't'x  yeTjc 

Now  under  batch-sampling,  a  server  may  receive  one  (and 
at  most  one)  task  if  it  is  probed.  Consider  server  j  such  that 
Xj  >  6  —  1.  Server  j  is  probed  with  probability  dm/nk , 
and  will  receive  one  task  if  it  is  among  the  m  least  loaded 
queues  in  the  md  probed  queues.  Conditioned  on  server  j 
is  probed,  define  >  to  be  the  number  of  probed  servers 
with  queue  size  at  most  6  —  2  among  the  other  dm  —  1 
servers.  According  to  Hoeffding’s  inequality  for  sampling 
without  replacement,  we  get 

P(Gb_2  <  to)  <  e~L±ym. 

Therefore,  we  conclude  that 


E  <?x, y(U(y)-U(x)) 
yefx 


EA  nk  dm 
-  x  -  x 

TO  Uk 


e 


(.d-l)2 

2d 


m 


(2(xj 


<  y  A de-ijLym  (2 {Xj  -  6)+  +  3) . 
j 


6  +  1)  +  1)+ 
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Note  that  (y3  —  b  +  1)+  =  0  for  any  queue  j  such  that 
Xj  <  b  -  2  since  each  server  is  given  at  most  one  task 
under  batch-sampling. 

•  Consider  x  such  that  x  fL  fib,  i.e., 


^  y  ^ Xj  <b—2  ^  ti.fc 

j 


d  +  1 
2d 


In  this  case,  we  compare  batch-filling  with  the  randomized 
load-balancing  algorithm  that  places  m  tasks  in  a  set 
of  randomly  selected  m  servers,  one  for  each  server. 
According  to  the  analysis  in  the  proof  of  Theorem  1,  we 
have 


Y l  9x,y(V(y)  -  V(x)) 


yet 
A  nk 


<  im+2^Y(Xj-b+iy 

TO  l  nkj 

=  A Tik  -f-  ^  (  2A  ( Xj  —  b  T 1)-^ 

3 

<  3A nk  +  YY  2  A  (xj  —  b)+ 


Furthermore,  we  note  that 


E 


1 

nk 


-b) 


+ 


OO 


=  E 


E 


(«fc) 

i 


i=b+ 1 


Therefore,  given  any  0  <  e  <  1,  there  exist  be  and  ne  such 
that  for  any  b  >  be  and  nk  >  n£,  the  following  inequality 
holds: 


E 

i=b-\- 1 


E 


'(nk) 


<  £• 


Combining  the  result  above  and  result  (52),  we  conclude 
that  given  any  0  <  e  <  1,  for  any  nk  >  ne/2  and 
b  >  max{&£/2,&£/2}, 


£t=&+i  ‘ 


li 


'  H 


<  £“6+i£ 


7, 


-  E~6+1  E  [7»]  <  e. 


which  concludes  the  proof. 


Appendix  G 

Proof  of  Corollary  12 


Combining  the  results  above,  we  have  that 

E^.y(F(y)_vr(x)) 

<  ^  —  2  f  1  —  Amaxjl,  de~(  2d  m}  J  (xj  —  b)++ 

3  '  ' 

3nkXde  2d  mP(a:  G  fib)  +2AnfcP(a;  fL  Cij,) . 


Recall  that  the  Markov  chain  is  positive  recurrent  according 
to  Theorem  1.  Assume  the  system  is  in  the  steady  state,  then 
we  have 


0  =E 


E9x’  y(y(y)“y(x)) 

yA* 


which  implies  that 


To  simplify  the  notation,  we  assume  k  =  2,  the  analysis  for 
fc  >  2  is  almost  identical  and  hence  omitted  here.  Now  for 
the  ?rth  system,  we  define  S ^  =  {i  :  i  >  2},  i.e.,  the  set  of 
all  servers  except  servers  1  and  2.  We  consider  the  following 
Markov  chain  (Q^(f),  Q^tt),  r)(n\t)),  where 


vY\t)  = 


£ie5(")  ^Q(n)(t)=i 


n 


n  —  2 
(n)(i)-I, 


Q$n)(t)=t  rQin)(t)=i 


n  —  2 


i.e.,  the  fraction  of  servers  with  queue  size  i  in  <S':n) .  Recall 
that  Q[n)  (f)  is  the  queue  length  of  the  first  server  in  the  nth 
system,  QY  (t)  is  the  queue  length  of  the  second  server  in 
the  nth  system,  and  Q\  and  Q2  are  the  queue  lengths  in 
the  steady  state.  Denote  by 


E 


nk 


Y2(xj-b)A 


(d~  1) 


< 


3A de  “  ™  m  P(x  G  fib)  +  2A  P(x  fL  fib) 
2  ^1  —  Amax{l,  de~(  2d> 


Now  given  any  0  <  e  <  1,  define  n£  such  that  for  any 

n  >  n£ 

(d- 1)2 

3Ade  2d  m  e  (d-i)2 

2  (1  —  A)  *  2  ^  “  *  L 


Such  n£  exists  because  m  =  0(logn)  and  m  is  increasing 
function  of  n.  Furthermore,  define  bf  such  that  for  any  b  >  bf 
and  in  the  steady  state  of  any  n^th  system, 

2A  P(x  ^  fib)  ^  e 
2(1  -  A)  “  2' 

Such  be  exists,  independent  of  nk.  due  to  inequality  (53). 


7T(n)  (x,  y,  rj)  =  P  ?(n))  =  {x,y,rj)}  , 

i.e.,  the  stationary  distribution  of  the  Markov  chain.  For  the  ?ith 
system,  the  global  balance  equation  for  a  given  state  (x,  y,  rj) 
is 

7T(n)  (x,  y,  rf)  Y1  rt),y,V)(S:’'ib'n) 

=  E  7t(n)  (x,y,t7)r((ji)-  -)(a:,i/,r/), 

(2»y,i7)7(2:»y,n) 

where  ^  (x,  y,  rj)  is  the  transition  rate  from  state  (x,  y,  rj) 
to  (x,  y,  fj)  in  the  ??th  system,  which  further  implies  that 

E  E  nin)  (*>  *?)  r!",E)  (*>  v) 

=  E  E  7T(n)  (5,^77)^- -)(a;,y,r7).  (55) 

n  (z,y,'n)dt{x,y,ri) 
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Note  that  for  (x,  y ,  77)  such  that  x  —  x  and  y  =  y, 

Y  Y n(n)  (*»  y *  *j)  r£y,r,)  (+  22  > »/) 

rj  ?7^T7 

=  X]  Y  7r(”)  (*>  22’  *7)  rt?,y,v)  (+  22,  T7)  (56) 

rj  17^77 

by  exchanging  the  notations  77  and  77.  Furthermore,  to  transit 
to  a  state  with  x  >  x  and  y  >  y,  server  1  and  server  2  need 
to  be  both  probed,  so 

y  r<”> 

(x, y,rj)  '  m  n(n  —  1)  V  n  ) 


x>x,y>y 

which  implies  that 


5Z  7T<n)  (x,  y,rj)  J2  r(x,y,v)^y^^  =  0  (n)  (57) 

77  x>x,y>y 

since  ^  7r(n2  (x,  17,77)  <  1.  Similarly,  we  have 


7r^n)  (x,  y,  77)  r^y~  ^(x,  y,rj)  =  O  ( —  j  ■  (58) 


rj  x<x,y<y 

Note  that 


^x.y.r,)^  “  2/’  »7)  =  r£y,r7)(a:>  2/  “  !>  »7)  =  !» 


and 


Y  7r(n)  (+  2/  +  !,  »7)  r(xi+i^)  (+  Vi  v)  =  7r(n)  (®.  V  +  1)  • 


Now  we  consider 


YYYn(n)  (*’  22’  *7)  (*’  22’  n) 

tj  £  >#  rj 

=  5Z  5Z  ^  (*>  22’  *7)  53  itn)  (*’  2/,  »j) 

5>ic  77  rj 

=7T(n)  (x,  17)  ^3  ^  7r(n)  (77|x,  y)  Y  r(3,y,T7)  <+’  22’ T?) 


X>X  77 


=7T(n)  (x,  y)  En  (*.  22) 


x,y 


Where  r£y, „)(*»2/)  =  E^^y.ni(5’22’»7)- 


Note  that 

r(x,y,7j)(*’  22)  = 


which  implies  that 


Er- 


r(«) 

(x,y,ri) 


(x,y) 


x,y 


It  is  easy  to  show  that  Xj/dm  converges  weakly  to  7 7 
because  77  converges  weakly  to  7.  Hence,  we  have 


lim  E„ 

77 — OO 

and 


r("i,77)(i’22)  £’22  =9x,a( 7)’  x<x<Qbf , 


XX""’  (x,  y,  77)  i,  y,  77) 

=  Y  7r(n)  (+  22’  »7)  r(",)J/^)(a:’  22  -  !’  v) 

V 

=  7T(n)(x,y),  (59) 

53  7r(n)  (*  +  1, 22’  »7)  r("ii)3,, ,,)(*>  Vi  *7)  =  7r(n)  (*  +  1.2/). 


lim  N  Er 

77, — Vno  < 


„(*») 

( x,y,rj ) 


(x,y) 


X,y 


=  0,  x  <  Qbf, 


x>Qbf 


where  qx^{l)  and  Q is y  are  defined  in  Lemma  2.  Since  0  < 
7r(”2(x,  y)  <  1  and  0  <  E,, 
can  conclude  that 


„(") 


y 


(x,y,r))(^i  2/) 

^EEE  7r(n)  (*•  22’  »?)  (*>  22,  r)) 


<  dX.  we 


TJ  X~>X  TJ 


=  lim  7 r(+  (x,  y)  Y  Er 

77, — V  OO  ‘  ^  1 


r(n) 

(x,y,v) 


(x,y) 


x,y 


r(n) 

(x,y,v) 


(x,y) 


x,y 


(60) 


:>y,»7) 


(i’22) 


a;,  22 


(61) 


=  lim  n^n\x,y)  *y  lim  Er 

77, — y  OO  _  •  ^  n—>oo  1 

Qbf>x>x 

+  lim  n^n\x,y)  lim  E„ 

n — ^00  n—>oo  ‘ 

x>x>Qbf 

=7r(x,  y)  Y  9x,x(  7)- 

Qbf>x>x 

Similarly,  we  have 

E  5Z  E  ^(n)  (*•  22’  r?)  r((S,n)^’  22’  »7) 

TJ  X<X  fj 

=  Y  7T(x,y)qx,x(i)- 

x<x<Qbf 

Summarizing  the  results  above,  (55)  implies  that 

(  N 

<xiV)  Y  Vx,x{l)  +  9y,y(  7) 

\Qbf>x>x  QBF>y>y 

=  5Z  7r(x,y)qx,x(j)  +  5Z  7T(x,y)qg,y{'y)- 

x<x<Qbf  y<y<QBF 

It  is  easy  to  verify  the  equation  above  is  the  detailed  balance 
equation  for  two  independent  and  identical  Markov  chains  with 


22 


transition  rates  given  in  Lemma  2,  and  the  unique  solution 
therefore  is  7 r(x,y)  =  Trxny  for  ir  defined  in  (4).  This  means 
that  queue  1  and  queue  2  are  independent  in  the  large-system 
limit. 


